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OPERATING METHOD FOR AN AUTOMATED LANGUAGE 
RECOGNIZER INTENDED FOR THE SPEAKER-INDEPENDENT 
LANGUAGE RECOGNITION OF WORDS IN DIFFERENT LANGUAGES 

5 

BACKGROUND 



The method relates to an operating method of an automatic language 
recognizer for speaker-independent language recognition of words of different 
10 languages and a corresponding automatic language recognizer. 



For phoneme-based language recognition, a language-recognition 
vocabulary is required, containing phonetic descriptions of all the words to be 
recognized. Typically, words are represented by sequences or chains of phonemes 
15 in the vocabulary. During a language recognition process, a search is conducted for 
the best path through various phoneme sequences found in the vocabulary. This 
search can, for example, take place by means of the Viterbi algorithms. For 
continuous language recognition, the probabilities for transitions between words 
can also be modeled and included in the Viterbi algorithm. 

20 

A phonetic transcription for the words to be recognized form the basis of 
phoneme-based language recognition. Therefore, at the start of a phoneme-based 
language recognition process, the first order is to obtain phonetic transcripts for the 
word. Phonetic transcripts can be generally defined as the phonetic descriptions of 
25 words from a target vocabulary. Obtaining phonetic transcripts particularly 
relevant for words that are not known to the language recognizer. 



Mobile or cordless telephones are known that enable speaker-dependent 
name selection. In this case, a user of such a telephone must train the entries 
30 contained in the electronic telephone book of the telephone in order to be able to 
subsequently use the name selection by spoken word. Normally, no other user can 
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use this feature because the speaker-dependent name selection is suitable for only 
one person, i.e. for the person who has trained the language selection. To 
overcome this problem, the entries in the electronic telephone book can be changed 
to phonetic transcripts. 

5 

To determine the phonetic transcript from a written word, for example from 
a telephone book entry, various approaches are known in the art. One example is a 
dictating system that is used with a PC. With dictating systems of this kind, a 
lexicon of typically more than 10,000 words with an allocation of letter sequences 
10 to the phoneme sequences is normally stored. Because a lexicon of this kind 
requires a very high storage capacity, it is not practical for mobile terminal devices 
such as mobile or cordless telephones to wholly incorporate this configuration. 

Systems are also known whereby the conversion of a word to its phonetic 
15 transcript is rule-based, or takes place using specially trained neural networks. As 
with the lexicon, this method also has one disadvantage that the language in which 
the phoneme sequences to be realized must be specified. In any case, names from 
different languages may be present, particularly in electronic telephone books. On 
a mobile device, converting words from different languages would be burdensome 
20 to wholly implement under the above configuration. 

Other multilingual systems for determining phoneme sequences and 
language recognition have been developed. These systems enable phoneme 
sequences to be created from different languages. 

25 

Under still other configurations, a user speaks the words into a language 
recognition system that automatically generates sequences of phonemes. However, 
for large vocabularies, (e.g., an electronic telephone book with 80 entries), this is 
no longer acceptable for the user. 

30 

SUMMARY OF THE INVENTION 
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The present disclosure provides an operating system and method for an 
automatic language recognizer for speaker-independent language recognition of 
words from various languages and also a corresponding automatic language 
5 recognizer that is simple to implement, is particularly suitable for use in mobile 
terminal devices and can be realized at reasonable cost. 

As an example, a method for voice recognition is provided including the 
steps of: 

10 (a) determining the phonetic transcripts of words for N various languages, in 

order to obtain N first phoneme sequences per word corresponding to N first 
pronunciation variants; 

(b) implementing a mapping of the phonemes of each language to the 
relevant phoneme set of the mother tongue; 

15 (c) using the mapping implemented in step (b) to the N first phoneme 

sequences for each word determined in step (a), whereby for each word N second 
phoneme sequences corresponding to N second pronunciation variants are obtained 
that can be recognized by means of a mother tongue language recognizer; and 

(d) creation of a language recognition vocabulary with the N second 

20 phoneme sequences per word, obtained in the preceding step, for the mother tongue 
language recognizer. 

As another example, a system for voice recognition is provided including: a 
mother tongue language recognizer; a first processing module for determining the 

25 phonetic transcripts of words for N various languages in each case, in order to 
obtain N first phoneme sequences for each word corresponding to N first 
pronunciation variants; a second processing module for implementing a mapping of 
the phonemes of each language to the particular phoneme set of the mother tongue; 
a third processing module for applying the mapping, implemented by means of the 

30 second processing module, to the N first phoneme sequences for each word 
determined by means of the first processing module, with N second phoneme 
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sequences corresponding to N second pronunciation variants being obtained per 
word, that can be recognized by means of the mother tongue language recognizer; 
and a fourth processing module for creating a language recognizable vocabulary 
with the AT second phoneme sequences per word, obtained by the third processing 
5 module, for the mother tongue language recognizer. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention and its wide variety of potential embodiments will be more 
10 readily understood through the following detailed description, with reference to the 
accompanying drawing in which: 

FIG. 1 is a schematic flow diagram of the input phase for creation of a 
language recognition vocabulary in accordance with an exemplary embodiment of 
1 5 the invention. 

DETAILED DESCRIPTION 

Under an exemplary embodiment, phonetic transcripts of words for TV 
20 various languages is determined and then reprocessed and applied to a phoneme- 
based monolingual language recognizer. This procedure works under the 
assumption that a user of the voice recognizer normally speaks in his/her mother 
tongue. The user may also pronounce foreign-language words, such as names, with 
a mother-tongue nuance, (i.e. an accent), that can be roughly modeled by a mother- 
25 tongue language recognizer. The operating method is therefore based on a language 
defined as the mother tongue. 

Each language can thus be described with different phonemes suitable for 
the particular language. It is known, however, that many phonemes in different 
30 languages resemble one another. An example of this is the f, p f ' in English and 
German. 
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This fact is utilized in multilingual language recognition. In this case a 
single Hidden Markov model is created for the collection of languages, by means of 
which several languages can be recognized simultaneously. However, this leads to 
5 a very large Hidden Markov model with a lower recognition rate than a 
monolingual Hidden Markov model. Furthermore, if the collection of languages is 
extended, for example by a secondary language, a new Hidden Markov model has 
to be created, which is very expensive. 

10 According to an exemplary embodiment, in a first step of the input phase 

for creation of a language recognition vocabulary of an operating procedure of an 
automated language recognizer for speaker-independent language recognition of 
words from various languages, particularly for the recognition of names from 
various languages, the phonetic transcripts of words for N various languages are 

15 determined in each case, in order to obtain AT first phoneme sequences per word 
corresponding to N first pronunciation variants. In a second step, the similarities 
between the languages are utilized. To do this, a depiction of the phonemes of each 
language is implemented on the particular phoneme set of the mother tongue. 
Furthermore, in a third step the implemented depiction on the N first phoneme 

20 sequences determined in the first step is used for each word. In this way, N second 
phoneme sequences corresponding to N second pronunciation variants are obtained 
for each word. By means of the mother-tongue language recognizer, a number of N 
various languages can then be recognized for the mother-tongue language 
recognizer after creating a language-recognition vocabulary using the N second 

25 phoneme sequences per word obtained in the preceding step. 

Whereas a look-up method in a lexicon configuration fails with mobile 
terminal devices because of the large memory requirement and for multilingual 
language recognition the set of languages was optimized, new Hidden Markov 
30 models have to be created and optimized for each new language by means of 
grapheme/phoneme conversion into several languages in accordance with the 
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invention, a multilingual system is created that can be implemented with relatively 
simple means. In addition to the grapheme-to-phoneme conversion, a mapping, i.e. 
a depiction between the individual languages, is implemented. The phoneme 
sequence determination and the succeeding mapping or depiction normally run 
5 offline on a device, for example a mobile telephone, a personal digital assistant or 
personal computer with corresponding software, and are therefore time uncritical. 
The resources required for this can be held in an internat/external memory. 

Because the language recognition vocabulary created by means of the 
10 aforementioned procedure includes an TV pronunciation variant for each word, the 
search effort during language recognition can be great. To reduce this, a further 
step can be introduced under the exemplary embodiment, that is performed before 
the creation of the language recognition vocabulary and after generation of the N 
second phoneme sequences per word. In this step, the N second phoneme sequences 
1 5 are processed corresponding to the N second pronunciation variants of each word, 
in that each second phoneme sequence is analyzed and classified by means of 
suitable distances, particularly the Levenshtein distance, and the N second phoneme 
sequences of each word are reduced to a few, preferably two to three phoneme 
sequences, in that the pronunciation variants that are least similar to the 
20 pronunciation variants of the mother tongue are omitted. Simply expressed, the 
least important pronunciation variants are omitted by this reduction, thus reducing 
the search effort during language recognition. 

A further reduction in cost can be achieved in that a language identification 
25 and reduction is carried out before the first step. As part of this language 
identification, the probability for each word to be recognized belonging to each of 
the N various languages is determined. Using the results of this language 
identification, the number of languages to be processed in the first step of the 
method is reduced, preferably to two or three different languages. This The 
30 languages with the least probability are not further processed. For a specific word, 
the result of the language identification can, for example, be as follows: "German 
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55%, UK English 16%, US English 14%, Swedish 3%, ... . Under this example, if 
only three languages are desired, the Swedish language is omitted, i.e. not further 
processed. 

5 The determination of the phonetic transcripts in the first step of the method 

takes place preferably by means of at least one neural network. Neural networks 
have proved suitable for determining phonetic transcripts from written words, 
because they produce good results with regards to accuracy, and particularly with 
regard to the speed of processing and can be easily implemented, particularly in 
10 software. 

A Hidden Markov model, particularly one that has been created for the 
language defined as a mother tongue, is suitable for use as a mother tongue 
language recognizer. 

15 

The exemplary embodiment of the invention relates to a language 
recognizer for speaker-independent language recognition of words from various 
languages, particularly for recognizing names from various languages. In this case, 
one of the various languages is defined as the mother tongue. The language 
20 recognizer includes: 

a mother tongue language recognizer, 

a first processing model for determining the phonetic transcripts of 
words, particularly for N various languages, in order to obtain N first phoneme 
sequences corresponding to TV first pronunciation variants per word, 

25 a second processing model for implementing a mapping of the 

phoneme of each language on the particular phoneme set of the mother tongue, 

a third processing model for applying the mapping, implemented by 
the second processing module, to N first phoneme sequences for each word, 
determined with the first processing model, whereby N second phoneme sequences 

30 corresponding to N second pronunciation variants are obtained per word, that can 
be recognized by the mother tongue language recognizer and 
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a fourth processing model for creating a language recognition 
vocabulary with the N second phoneme sequences per word obtained by the third 
processing module for the mother tongue language recognizer. 

5 Under a preferred embodiment, the automatic language recognizer has a 

fifth processing module for processing the N second phoneme sequences 
corresponding to the AT second pronunciation variant of each word. The fifth 
processing module is designed in such a way that each second phoneme sequence is 
analyzed and classified using suitable distances, particularly the Levenshtein 
10 distance and the N second phoneme sequences of each word are reduced to a few, 
preferably two to three, phoneme sequences. 

Furthermore, the automatic language recognizer can have a language 
identifier and a language reducer. The language identifier is connected before the 

15 first processing module and, for each word to be recognized, it determines the 
probability of it belonging to each of the N different languages. The language 
reducer reduces the number of languages to be processed by the first processing 
module, preferably down to two to three different languages, so that the languages 
with the least probability are not further processed. The language identifier and 

20 language reducer substantially reduce both the processing effort of the automatic 
language recognizer, both in the input phase and in the recognition phase. 

Preferably, the first processing module has at least one neural network for 
determining the phonetic transcripts. 

25 

Furthermore, the mother tongue language recognizer has, in a preferred 
form of embodiment, a Hidden Markov model that has been created for the 
language defined as the mother tongue. 

30 Turning to FIG. 1, a speaker-related name is selected on a mobile telephone 

using the names from a telephone book, for a German-speaking user. In the 
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telephone book, there are in addition to the mainly German-language names, also 
some foreign-language names. A transcriber for the graphemic representation of the 
names is set for the German, Italian, Czech, Greek and Turkish languages, overall 
as TV = 5 different languages. 

5 

In an initial step SO of FIG. 1 , a language identification of the supplied 
words 10 or entries in the telephone book is undertaken. More precisely, each 
individual word is analyzed with regard to the probability of it belonging to one of 
the five languages. If, for example, a German name is being processed, the 

10 probability for German is very high. For the other four languages, i.e. Italian, 
Czech, Greek and Turkish, the probability is much lower. Using the probabilities 
determined per word, the language with the lowest probability is omitted during 
subsequent processing. As an example, this means that in the succeeding processing 
operation there are then only four, instead of five, languages that have to be 

15 processed. 

In a first step SI of FIG. 1, the phonetic transcript for each word is 
determined for each of the four different languages. In this way, four phoneme 
sequences corresponding to the four first pronunciation variants are obtained for 
20 each word. 

In a second step S2 of FIG. 1, a mapping of the phonemes of each of the 
four languages is implemented to the particular phoneme set of the mother tongue. 

25 In a third step S3 of FIG. 1, this mapping is applied to the four first 

phoneme sequences 12 obtained in the first step SI. In this way, four second 
phoneme sequences 14 corresponding to the four second pronunciation variants are 
obtained for each word. The four second phoneme sequences 14 can already be 
recognized in a mother tongue language recognizer. 

30 

11 



Int'l Application No.: PCT/EP03/00003 



Furthermore to further reduce the processing effort for the language 
recognizer, each second phoneme sequence is analyzed and classified for each 
word using the Levenshtein distance (step S4). A fifth step S5 then takes place, in 
which the analyzed and classified second phoneme sequences per word are reduced 
5 to three phoneme sequences. 

Finally, in a last step S6, a language recognition vocabulary is created for 
the mother tongue language recognizer with the three second phoneme sequences 
per word obtained in the fifth step S5. By still further reducing the phoneme 
10 sequences in the fifth step of the method S5, the language recognition vocabulary to 
be saved and to be analyzed during a language recognition process is substantially 
reduced. In a practical application of the language recognizer, this has an 
advantage of having a lower storage capacity requirement and also of a faster 
processing, because the vocabulary to be searched through is smaller. 

15 

After the described procedure has been completed, the user can, by means 
of language recognition, make a name selection, i.e. make a language-controlled 
call up of stored telephone numbers using the name of the subscriber, without 
having to explicitly pronounce the name of the subscriber to be called, i.e. without 
20 having to "train". 

Furthermore, if a user finds that a certain name is not well recognized, the 
user can call up the language recognition menu of his mobile telephone and then 
select a "name selection" application. By means of this application, the user can 
25 now be offered one, or several ways of improving the language recognition of a 
certain word, or more precisely of a certain name, from the electronic telephone 
book of the mobile telephone. Some of these possibilities are briefly explained in 
the following by way of example. 

30 1. As an alternate embodiment, the user can again speak the poorly 

recognized or unrecognized word into the mobile telephone and then have it 
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converted into a phoneme sequence by means of the language recognizer contained 
in the mobile telephone. In this case, pronunciation variants previously 
automatically determined are either completely or partially removed from the 
vocabulary of the language recognizer, depending on their closeness to the newly 
5 determined phoneme sequence. 

2. As yet another alternate embodiment, the user can have a kind of 
phonetic transcription of the poorly recognized or unrecognized entry in the 
electronic telephone book shown on the display of the mobile telephone. As an 

10 example, if there is a poor match to the user's pronunciation, the user can edit the 
kind of phonetic transcription. For example, by an automatic transcription of the 
entry "Jacques Chirac", "Jakwes Shirak" can be stored as a phonetic transcription. 
If this phonetic transcription now appears incorrect to the user, he can edit it using 
his mobile telephone, for example to "Zhak Shirak". The system can then also 

15 determine the phonetic description and reenter this in the language recognition 
vocabulary. This should enable the automatic language recognition to function 
reliably. 

3. Also, the user can, by an explicit specification of a language from 
20 which a faulty or even unrecognized name originates substantially improve the 

recognition by an explicit selection of a specific language for a specific name. In 
such a case, all the pronunciation variants of the name, that are not assigned to the 
explicitly specified language, are removed from the language recognition 
vocabulary. 

25 

In addition, although the invention is described in connection with mobile 
telephones, it should be readily apparent that the invention may be practiced with 
any type of communicating device, such as a personal assistant or a PC. It is also 
understood that the device portions and segments described in the embodiments 
30 above can substituted with equivalent devices to perform the disclosed methods and 
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processes. Accordingly, the invention is not limited by the foregoing description or 
drawings, but is only limited by the scope of the appended claims. 

ABSTRACT OF THE DISCLOSURE 

5 The invention relates to an operating method for an automated language 

recognizer intended for the speaker-independent language recognition of words 
from different languages, particularly for recognizing names from different 
languages. The method is based on a language defined as the mother tongue and 
has an input phase for establishing a language recognizer vocabulary. Phonetic 
10 transcripts are determined for words in various languages in order to obtain 
phoneme sequences for pronunciation variants The phonemes of each relevant 
phoneme set of the mother tongue are then specifically mapped to determine 
phoneme sequences that correspond to pronunciation variants. 
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Marked-Up Version of Substitute Specification 

D e scription 

Op e rating m e thod for an automat e d languag e r e cognizer intended for th e 
5 sp e aker ind e pend e nt languag e r e cognition of words in 5 diff e r e nt languag e s and 
automat e d languag e r e cogniz e r. 

OPERATING METHOD FOR AN AUTOMATED LANGUAGE 
RECOGNIZER INTENDED FOR THE SPEAKER-INDEPENDENT 
LANGUAGE RECOGNITION OF WORDS IN DIFFERENT LANGUAGES 
10 AND AUTOMATED LANGUAGE RECOGNIZE R 

BACKGROUND 

The method relates to an operating method of an automatic language 
15 recognizer for speaker-independent language recognition of words of different 
languages in accordanc e with Claim 1 and a corresponding automatic language 
recognize r in accordance with Claim 6 . 

For phoneme-based language recognition, a language-recognition 
20 vocabulary is n e cessary tha t required, contains th e containing phonetic descriptions 
of all the words to be recognized. This is a basic r e quir e ment for phon e m e bas e d 
languag e r e cognition. Typically, Wwords in this cas e are represented by sequences 
or chains of phonemes in the vocabulary. During a language recognition process, a 
search is conducted for the best path through the- various p honeme sequences found 
25 in the vocabular y is carri e d out . This search can, for example, take place by means 
of the Viterbi algorithms. For continuous language recognition, the probabilities 
for transitions between words can also be modeled and included in the Viterbi 
algorithm. 

30 The-A phonetic transcription for the words to be recognized form the basis 

of the-phoneme-based language recognition. Therefore, at the start of use of a 
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phoneme-based language recognition process, the first order is to obtain qu e stion is 
always how such p honetic transcripts for the word can be obtain e d . Phonetic 
transcripts in this cas e m e ans can be generally defined as the phonetic descriptions 
of words from a target vocabulary. This question is Obtaining phonetic transcripts 
5 particularly relevant for words that are not known to the language recognizer. 

Mobile or cordless telephones are known that enable speaker-dependent 
name selection. In this case, aA user of such a telephone must in this cas e train the 
entries contained in the electronic telephone book of the telephone^ in order to be 
10 able to subsequently use the name selection by language spoken word . Normally, 
no other user can use this feature because the speaker-dependent name selection is 
suitable for only one person, i.e. for the person who has trained the language 
selection. To overcome this problem, the entries in the electronic telephone book 
can be changed to phonetic transcripts. 

15 

To determine the phonetic transcript from a written word, for example from 
a telephone book entry, various approaches are know n in the art . For -One example? 
is a the-dictating systems that ar e g e n e rallyi s used with a PC should be m e ntion e d . 
With dictating systems of this kind a a lexicon of typically more than 10,000 words 
20 with an allocation of letter sequences to the phoneme sequences is normally stored. 
Because a lexicon of this kind requires a very high storage capacity, it is not 
practical for mobile terminal devices such as mobile or cordless telephones to 
wholly incorporate this configuration . 

25 Systems are also known whereby the conversion of a word to its phonetic 

transcript is rule-based^ or takes place using specially trained neural networks. As 
with the lexicon, this method also has the -one disadvantage that the language in 
which the phoneme sequences to be realized must be specified. In any case, names 
from different languages may be present particularly in electronic telephone books. 

30 Conversion would th e n b e impossibl e , or only limited, with tho method described 
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abeve rOn a mobile device, converting words from different languages would be 
burdensome to wholly implement under the above configuration. 

For this purpos e , Other multilingual systems for determining phoneme 
5 sequences and language recognition have been developed. These systems enable 
phoneme sequences to be created from different languages. 

Finally there is on e oth e r solution, i. e . U nder still other configurations, a user 
speaks the words into a language recognition system that , from th e se, automatically 
10 generates sequences of phonemes. However, F for large vocabularies, and also e v e n 
for just a f e w doz e n words such as for e xampl e in (e.g., an electronic telephone 
book with 80 entries), this is no longer acceptable for the user. 

SUMMARY OF THE INVENTION 

15 

The obj e ct of this inv e ntion is ther e for e to propos e an p resent disclosure 
provides an operating system and method e ^for an automatic language recognizer 
for speaker-independent language recognition of words from various languages and 
also a corresponding automatic language recognizer that is simple to implement, is 
20 particularly suitable for use in mobile terminal devices and can be realized at 
reasonable cost. The obj e ct is achi e v e d by an op e rating m e thod with th e f e atures of 
Claim 1 and by an automatic languag e r e cognizer with th e f e atur e s of Claim 6. 

As an example, a method for voice recognition is provided including the 
25 steps of: 

(a) determining the phonetic transcripts of words for N various languages, 
in order to obtain N first phoneme sequences per word corresponding to N first 
pronunciation variants; 

(b) implementing a mapping of the phonemes of each language to the 
30 relevant phoneme set of the mother tongue; 
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(c) using the mapping implemented in step (b) to the N first phoneme 
sequences for each word determined in step (a), whereby for each word N second 
phoneme sequences corresponding to N second pronunciation variants are obtained 
that can be recognized by means of a mother tongue language recognizer; and 
5 (d) creation of a language recognition vocabulary with the N second 

phoneme sequences per word, obtained in the preceding step, for the mother tongue 
language recognizer. 

As another example, a system for voice recognition is provided including: 
10 a mother tongue language recognizer; a first processing module for 

determining the phonetic transcripts of words for N various languages in each case, 
in order to obtain N first phoneme sequences for each word corresponding to N first 
pronunciation variants; a second processing module for implementing a mapping of 
the phonemes of each language to the particular phoneme set of the mother tongue; 
15 a third processing module for applying the mapping, implemented by means of the 
second processing module, to the N first phoneme sequences for each word 
determined by means of the first processing module, with N second phoneme 
sequences corresponding to N second pronunciation variants being obtained per 
word, that can be recognized by means of the mother tongue language recognizer; 
20 and a fourth processing module for creating a language recognizable vocabulary 
with the N second phoneme sequences per word, obtained by the third processing 
module, for the mother tongue language recognizer. 

BRIEF DESCRIPTION OF THE DRAWINGS 

25 

The invention and its wide variety of potential embodiments will be more 
readily understood through the following detailed description, with reference to the 
accompanying drawing in which: 
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FIG. 1 is a schematic flow diagram of the input phase for creation of a 
language recognition vocabulary in accordance with an exemplary embodiment of 
the invention. 

5 DETAILED DESCRIPTION 

The invention is e ss e ntially bas e d on the id e a of det e rminin g Under an 
exemplary embodiment, phonetic transcripts of words for N various languages in 
e ach cas e is determined and then reprocessinged-these and applying them a pplied to 

10 a phoneme-based monolingual language recognizer. This procedure is essentially 
bas e d on the knowledge works under the assumption that a user of the voice 
recognizer normally speaks in his /her mother tongue. H e also pronounces The user 
may also pronounce foreign-language words, such as names, with a mother-tongue 
nuance, {i.e. an accent), that can be roughly modeled by a mother-tongue language 

15 recognizer. The operating method is therefore based on a language defined as the 
mother tongue. 

Each language can thus be described with different phonemes suitable for 
the particular language. It is known, however, that many phonemes in different 
20 languages resemble one another. An example of this is the "p" in English and 
German. 

This fact is utilized in multilingual language recognition. In this case a 
single Hidden Markov model is created for the collection of languages, by means of 

25 which several languages can be recognized simultaneously. However, this leads to 
a very large Hidden Markov model with a lower recognition rate than a 
monolingual Hidden Markov model. Furthermore, if the collection of languages is 
extended, for example by a furth e r secondary language, a new Hidden Markov 
model has to be created, which is very expensive. — Th e inv e ntion avoids this 

30 necessity. 

19 



Int'l Application No.: PCT/EPO3/000O3 

According to the invention an exemplary embodiment, in a first step of the 
input phase for creation of a language recognition vocabulary of an operating 
procedure of an automated language recognizer for speaker-independent language 
recognition of words from various languages, particularly for the recognition of 
5 names from various languages, the phonetic transcripts of words for N various 
languages are determined in each case, in order to obtain N first phoneme 
sequences per word corresponding to N first pronunciation variants. In a second 
step, the similarities between the languages are utilized. To do this, a depiction of 
the phonemes of each language is implemented on the particular phoneme set of the 

10 mother tongue. Furthermore, in a third step the implemented depiction on the N 
first phoneme sequences determined in the first step is used for each word. In this 
way, N second phoneme sequences corresponding to N second pronunciation 
variants are obtained for each word. By means of the mother-tongue language 
recognizer, a number of N various languages can then T aft e r cr e ating a languag e 

15 r e cognition vocabulary using the N s e cond phon e m e s e qu e nc e s per word obtained 
in the pr e c e ding st e p, be recognized for the mother-tongue language recognizer 
after creating a language-recognition vocabulary using the N second phoneme 
sequences per word obtained in the preceding step . 

20 The invention has th e following main advantag e s. Whereas a look-up 

method in a lexicon configuration fails with mobile terminal devices because of the 
large memory requirement and for multilingual language recognition the set of 
languages was optimized, new Hidden Markov models have to be created and 
optimized for each new language^ by means of the-grapheme/phoneme conversion 

25 into several languages in accordance with the invention, a multilingual system is 
created that can be implemented with relatively simple means—, that is therefore 
particularly suitable for use in mobil e terminal d e vic e s and not least can be r e aliz e d 
at r e asonabl e cost. For th e invention, all that is e ss e ntially r e quir e d inln addition to 
the grapheme-to-phoneme conversion^ a mapping, i.e. a depiction between the 

30 individual languages, as e xplain e d abov e is implemented . The phoneme sequence 
determination and the succeeding mapping or depiction normally run offline on a 
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device, for example a mobile telephone, a personal digital assistant or personal 
computer with corresponding software, and are therefore time uncritical. The 
resources required for this can be held in a slowan internat/external memory. 

5 Because the language recognition vocabulary created by means of the 

aforementioned procedure includes an N pronunciation variant for each word, the 
search effort during language recognition is -can be g reat. To reduce this, a further 
step can be introduced into th e proc e ss under the exemplary embodiment , that is 
performed before the creation of the language recognition vocabulary and after 

10 generation of the N second phoneme sequences per word. In this step, the N second 
phoneme sequences are processed corresponding to the N second pronunciation 
variants of each word, in that each second phoneme sequence is analyzed and 
classified by means of suitable distances, particularly the Levenshtein distance, and 
the N second phoneme sequences of each word are reduced to a few, preferably two 

15 to three phoneme sequences, particularly in that the pronunciation variants that are 
least similar to the pronunciation variants of the mother tongue are omitted. Simply 
expressed, the least important pronunciation variants are omitted by this reduction, 
thus reducing the search effort during language recognition. 

20 A further reduction in cost can be achieved in that a language identification 

and reduction is carried out before the first step. As part of this language 
identification, the probability for each word to be recognized e f-belonging to each 
of the N various languages is determined for e ach word to b e r e cogniz e d . Using 
the results of this language identification, the number of languages to be processed 

25 in the first step of the method is reduced, preferably to two or three different 
languages. This language reduction advantageously takes place in that tho The 
languages with the least probability are not further processed. For a specific word, 
the result of the language identification can, for example, be as follows: "German 
55%, UK English 16%, US English 14%, Swedish 3%, ... . Under this example, if 

30 only three languages are desired. This r e sult enabl e s a r e duction to thr e e differ e nt 
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languag e s to b e mad e , in that Sw e dish the Swedish language is omitted, i.e. not 
further processed. 

The determination of the phonetic transcripts in the first step of the method 
5 takes place preferably by means of at least one neural network. Neural networks 
have proved suitable for determining phonetic transcripts from written words, 
because they produce good results with regards to accuracy, and particularly with 
regard to the speed of processing and can be easily implemented, particularly in 
software. 

10 

A Hidden Markov model, particularly one that has been created for the 
language defined as a mother tongue, is particularly suitable for use as a mother 
tongue language recognizer. 

15 Th e invention also The exemplary embodiment of the invention relates to a 

language recognizer for speaker-independent language recognition of words from 
various languages, particularly for recognizing names from various languages. In 
this case, one of the various languages is defined as the mother tongue. The 
language recognizer includes: 

20 a mother tongue language recognizer, 

a first processing model for determining the phonetic transcripts of 
words, particularly for N various languages, in order to obtain N first phoneme 
sequences corresponding to N first pronunciation variants per word, 

a second processing model for implementing a mapping of the 

25 phoneme of each language on the particular phoneme set of the mother tongue, 

a third processing model for applying the mapping, implemented by 
the second processing module, to N first phoneme sequences for each word, 
determined with the first processing model, whereby N second phoneme sequences 
corresponding to N second pronunciation variants are obtained per word, that can 

30 be recognized by the mother tongue language recognizer and 
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a fourth processing model for creating a language recognition 
vocabulary with the N second phoneme sequences per word obtained by the third 
processing module for the mother tongue language recognizer. 

5 fa -Under a preferred form of t he embodiment, the automatic language 

recognizer has a fifth processing module for processing the N second phoneme 
sequences corresponding to the N second pronunciation variant of each word. The 
fifth processing module is designed in such a way that each second phoneme 
sequence is analyzed and classified using suitable distances, particularly the 
10 Levenshtein distance and the N second phoneme sequences of each word are 
reduced to a few, preferably two to three, phoneme sequences. 

Furthermore, the automatic language recognizer can have a language 
identifier and a language reducer. The language identifier is connected before the 

15 first processing module and, for each word to be recognized, it determines the 
probability of it belonging to each of the N different languages. The language 
reducer reduces the number of languages to be processed by the first processing 
module, preferably down to two to three different languages, in— so that the 
languages with the least probability are not further processed. The language 

20 identifier and language reducer substantially reduce both the processing effort of 
the automatic language recognizer, both in the input phase and in the recognition 
phase. 

Preferably, the first processing module has at least one neural network for 
25 determining the phonetic transcripts. 

Finall y Furthermore, the mother tongue language recognizer has, in a 
preferred form of embodiment, a Hidden Markov model that has been created for 
the language defined as the mother tongue. 

30 
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Advantag e s and suitabiliti e s of th e inv e ntion are giv e n in the following 
d e scription of an e xample of an e mbodim e nt of th e inv e ntion, using a singl e 
illustration. This shows a schematic flow diagram of the input phase for creation of 
a languag e recognition vocabulary in accordanc e with th e inv e ntion. 

5 

Turning to FIG. 1. A a speaker-related name is te-be-selected on a mobile 
telephone using the names from the-atelephone book, for a German-speaking user. 
In the telephone book, there are in addition to the mainly German-language names, 
also some foreign-language names. A transcriber for the graphemic representation 
10 of the names is set for the German, Italian, Czech, Greek and Turkish languages, 
overall as N = 5 different languages. 

In an initial step SO of FIG. 1, a language identification of the supplied 
words 10 or entries in the telephone book is undertaken. More precisely, each 

1 5 individual word is analyzed with regard to the probability of it belonging to one of 
the five languages. If, for example, a German name is being processed, the 
probability for German is very high ? ^ Ffor the other four languages, i.e. Italian, 
Czech, Greek and Turkish, the probability is very m uch lower. Using the 
probabilities determined per word, the language with the lowest probability is 

20 omitted during th e furthe r subsequent processing. This -As an example, this means 
that in the succeeding processing operation there are then only four, instead of five, 
languages that have to be processed. 

In a first step of th e m e thod SI of FIG. 1 , the phonetic transcript for each 
25 word is determined for each of the four different languages. In this way, four 
phoneme sequences corresponding to the four first pronunciation variants are 
obtained for each word. 

In a second step of the m e thod S2 of FIG. 1, a mapping of the phonemes of 
30 each of the four languages is implemented to the particular phoneme set of the 
mother tongue. 
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In a third step of the method S3 of FIG. 1 , this mapping is applied to the 
four first phoneme sequences 12 obtained in the first step of th e m e thod SI. In this 
way, four second phoneme sequences 14 corresponding to the four second 
5 pronunciation variants are obtained for each word. The four second phoneme 
sequences 14 can already be recognized in a mother tongue language recognizer. 

Furthermore to further reduce the processing effort for the language 
recognizer, each second phoneme sequence is analyzed and classified for each 
10 word using the Levenshtein distance (step S4). A fifth step of th e m e thod S5 then 
takes place, in which the analyzed and classified second phoneme sequences per 
word are reduced to three phoneme sequences. 

Finally, in a last step S6, a language recognition vocabulary is created for 
15 the mother tongue language recognizer with the three second phoneme sequences 
per word obtained in the fifth step of th e m e thod S5. By again -still further reducing 
the phoneme sequences in the fifth step of the method S5, the language recognition 
vocabulary to be saved and to be analyzed during a language recognition process is 
substantially reduced. In a practical application of the language recognizer, this has 
20 the-an advantage on the on e hand o f having a lower storage capacity requirement 
and on th e oth e r hand also of a faster processing, because the vocabulary to be 
searched through is smaller. 

After the described procedure has been completed, the user can, by means 
25 of language recognition, make a name selection, i.e. make a language-controlled 
call up of stored telephone numbers using the name of the subscriber, without 
having to enee-explicitly pronounce the name of the subscriber to be called, i.e. 
without having to "train". 

30 Th e following is a brief explanation of what th e us e r of th e mobil e 

t e l e phon e can do to improve languag e r e cognition. Furthermore, if a use r If he finds 
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that a certain name is not well recognized, he -the user can call up the language 
recognition menu of his mobile telephone and then select the-aj f name selection" 
application. By means of this application, he -the user can now be offered one, or 
several^ ways of improving the language recognition of a certain word, or more 
5 precisely of a certain name, from the electronic telephone book of the mobile 
telephone. Some of these possibilities are briefly explained in the following by 
way of example. 

1. As an alternate embodiment, ^the user can again speak the poorly 
10 recognized or unrecognized word into the mobile telephone and then have it 

converted into a phoneme sequence by means of the language recognizer contained 
in the mobile telephone. In this case, pronunciation variants previously 
automatically determined are either completely or partiall y, dep e nding on th e ir 
closeneGG to th e newly d e t e rmin e d phon e me sequence, ^ removed from the 
15 vocabulary of the language recognize r, depending on their closeness to the newly 
determined phoneme sequence . 

2. Alt e rnativ e l y As vet another alternate embodiment , the user can have 
a kind of phonetic transcription of the poorly recognized or unrecognized entry in 

20 the electronic telephone book shown on the display of the mobile telephone. If it is 
inappropriat e , i. e . As an example, if there is a poor match to his— the user's 
pronunciation, the user can edit the kind of phonetic transcription. For example, by 
an automatic transcription of the entry "Jacques Chirac", "Jakwes Shirak" can be 
stored as a phonetic transcription. If this phonetic transcription now appears 

25 incorrect to the user, he can edit it using his mobile telephone, for example to 
"Zhak Shirak". The system can then also determine the phonetic description and re- 
enter this in the language recognition vocabulary. This should enable the automatic 
language recognition to function reliably. 

30 3. Finall y Also, the user can, by an explicit specification of a language 

from which a faulty or even unrecognized name originates substantially improve 
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the recognition by an explicit selection of a specific language for a specific name. 
In such a case, all the pronunciation variants of the name, that are not assigned to 
the explicitly specified language, are removed from the language recognition 
vocabulary. 

5 

The invention can also bo advantag e ously us e d, i.e. installed, in other 
mobil e d e vic e s apart form a mobile t e l e phon e , e .g. a personal assistant or a 
p e rsonal comput e r. In addition, although the invention is described in connection 
with mobile telephones, it should be readily apparent that the invention may be 

10 practiced with any type of communicating device, such as a personal assistant or a 
PC. It is also understood that the device portions and segments described in the 
embodiments above can substituted with equivalent devices to perform the 
disclosed methods and processes. Accordingly, the invention is not limited by the 
foregoing description or drawings, but is only limited by the scope of the appended 

15 claims. 

ABSTRACT OF THE DISCLOSURE 

The invention relates to an operating method for an automated language 
recognizer intended for the speaker-independent language recognition of words 

20 (10) from different languages, particularly for recognizing names from different 
languages. Said -The method is based on a language defined as the mother tongue 
and has an input phase for establishing a language recognizer vocabulary. Phonetic 
transcripts are determined for words in various languages in order to obtain 
phoneme sequences for pronunciation variants The phonemes of each relevant 

25 phoneme set of the mother tongue are then specifically mapped to determine 
phoneme sequences that correspond to pronunciation variants. 
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This listing of claims will replace all prior versions, and listings, of claims 
in the application: 
Listing of claims: 

Claims 1-11.: (canceled) . 
5 Claim 12.: (new) A method for automated language recognition of 

words from different languages said method comprising the steps of: 

(a) loading a phoneme set associated with a language specified as a mother 
tongue into a mother tongue language recognizer; 

(b) determining the phonetic transcripts of each word for N various 
10 languages not specified as the mother tongue to obtain N first phoneme sequences 

for each word corresponding to N first pronunciation variants; 

(c) calculating a phoneme map by mapping the first phoneme sequences of 
each of said N languages to a relevant phoneme set of the mother tongue; 

(d) determining N second phoneme sequences corresponding to N second 
1 5 pronunciation variants from said phoneme map for each word; and 

(e) processing said N second phoneme sequences with the phoneme set 
associated with the language specified as a mother tongue to identify matching or 
similar words. 

20 Claim 13. (new) The method according to Claim 12, further 

comprising a step of adding the N second phoneme sequences for each word in a 
language recognition vocabulary located in the mother tongue language recognizer. 

Claim 14. (new) The method according to Claim 12, further 
25 comprising the step of processing the N second phoneme sequences to determine 
distances to the N second pronunciation variants. 

Claim 15. (new) The method according to Claim 14, further 
comprising a step of classifying each N second phoneme sequences to identify 
30 respective distances. 
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Claim 16. (new) The method according to Claim 15, further 
comprising a step of eliminating any N second phoneme sequences that do not meet 
or exceed a predetermined threshold. 

5 Claim 17. (new) The method according to Claim 16, wherein the 

distances are Leveshtein distances. 

Claim 18. (new) The method according to Claim 12, further 
comprising the step of determining the probabilities that each word for N various 
10 languages not specified as the mother tongue belong to a specified set of languages, 
said step of determining probabilities occurring before step (a). 

Claim 19. (new) The method according to Claim 18, further 
comprising the step of eliminating languages from said specified set that do not 
1 5 meet or exceed a predetermined threshold. 

Claim 20. (new) The method according to Claim 12, wherein the step 
of determining the phonetic transcripts of each word for N various languages not 
specified as the mother tongue is performed by at least one neural network. 

20 

Claim 21. (new) The method according to Claim 12, wherein 
processing said N second phoneme sequences with the phoneme set associated with 
the language specified as a mother tongue is performed via a Hidden Markov 
Model. 

25 

Claim 22. (new) An automatic language recognizing apparatus, 
receiving words from various languages, comprising: 

a mother tongue language recognizer, said recognizer storing a phoneme set 
of a predetermined mother tongue; 
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a first processing module for determining the phonetic transcripts of words 
from N various languages in order to obtain N first phoneme sequences for each 
word corresponding to N first pronunciation variants; 

a second processing module for implementing a mapping of the phonemes 
5 of each of AT language to a particular phoneme set of the mother tongue; 

a third processing module for applying the mapping, implemented by means 
of the second processing module, to the N first phoneme sequences for each word 
determined by means of the first processing module, with N second phoneme 
sequences corresponding to N second pronunciation variants being obtained per 
10 word, that can be recognized by means of the mother tongue language recognizer; 
and 

a fourth processing module for creating a language recognizable vocabulary 
with the N second phoneme sequences per word, obtained by the third processing 
module, for the mother tongue language recognizer. 

15 

Claim 23. (new) The automatic language recognizing apparatus 
according to claim 22, further comprising a fifth processing module for processing 
the N second phoneme sequences corresponding to the N second pronunciation 
variants of each word to obtain distances for each N second phoneme sequence. 

20 

Claim 24 (new) The automatic language recognizing apparatus 
according to claim 23, wherein said distances are Levenshtein distances. 

Claim 25. (new) The automatic language recognizing apparatus 
25 according to claim 24, wherein the N second phoneme sequence distances not 
meeting or exceeding a predetermined threshold are eliminated from further 
processing. 

Claim 26. (new) The automatic language recognizing apparatus 
30 according to claim 22, further comprising a language identifier, coupled to the first 
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processing module, wherein the language identifier determines a probability of each 
word belonging to each of the N different languages. 

Claim 27. (new) The automatic language recognizing apparatus 
according to claim 26, further comprising a language reducer that reduces the 
number of languages from the first processing module to be processed if said 
probability does not meet or exceed a predetermined thresholds. 

Claim 28. (new) The automatic language recognizing apparatus 
according to claim 22, wherein the first processing module comprises at least one 
neural network for determining the phonetic transcripts. 

Claim 29. (new) The automatic language recognizing apparatus 
according to claim 22, wherein the mother tongue language recognizer comprises a 
Hidden Markov model that has been created for the language defined as the mother 
tongue. 
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Amendment to the Drawings: 

The attached sheet of drawings includes changes to FIG. 1. This sheet 
replaces the original sheet showing FIG. 1 . The Drawings were amended to include 
5 the heading "FIG. 1" as shown. 

Attachment: Replacement Sheet 
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